fix: resolve .env from CWD and stop _do_deploy env mutation#236
fix: resolve .env from CWD and stop _do_deploy env mutation#236
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes environment variable loading and deployment-time env injection to ensure .env files are resolved from the user’s working directory and to prevent config drift caused by mutating hashed configuration fields during deploy.
Changes:
- Resolve
.envfrom CWD viafind_dotenv(usecwd=True)for bothload_dotenv()anddotenv_values(). - Stop mutating
self.envduring_do_deploy; inject runtime env vars intoself.template.envvia a new_inject_template_env()helper. - Add/adjust unit tests to cover the new injection behavior and updated dotenv import/call patterns.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/runpod_flash/__init__.py |
Loads .env via find_dotenv(usecwd=True) before other imports. |
src/runpod_flash/core/resources/environment.py |
Reads .env values from a CWD-upwards search instead of package-relative resolution. |
src/runpod_flash/core/resources/serverless.py |
Adds _inject_template_env() and switches deploy-time injections to template env to avoid env hash drift. |
src/runpod_flash/core/resources/load_balancer_sls_resource.py |
Injects FLASH_ENDPOINT_TYPE into template env instead of mutating self.env. |
tests/unit/test_dotenv_loading.py |
Updates assertions to match new dotenv import/call pattern. |
tests/unit/resources/test_serverless.py |
Adds tests validating template env injection and non-mutation of self.env during deploy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| has_explicit_template_env = ( | ||
| not new_config.env and new_config.template.env is not None |
There was a problem hiding this comment.
has_explicit_template_env is computed as not new_config.env and new_config.template.env is not None, but PodTemplate.env defaults to an empty list (not None). That makes has_explicit_template_env effectively always true whenever new_config.env is {}, forcing skip_env=False and causing update_template() to send an empty env list that can wipe platform-injected vars (e.g. PORT/PORT_HEALTH) and trigger rolling releases. Consider detecting whether env was explicitly set on the template (e.g. via Pydantic’s model_fields_set / __pydantic_fields_set__) instead of checking is not None.
| has_explicit_template_env = ( | |
| not new_config.env and new_config.template.env is not None | |
| template_fields_set = getattr( | |
| new_config.template, "__pydantic_fields_set__", set() | |
| ) | |
| has_explicit_template_env = ( | |
| not new_config.env and "env" in template_fields_set |
c5916e6 to
405f046
Compare
081d553 to
788808e
Compare
dotenv_values() and load_dotenv() without arguments use find_dotenv() which walks up from the calling file's directory. For editable installs, this resolves the library's dev .env (PYTHONPATH=src, FLASH_HOST=localhost) instead of the user's project .env (HF_TOKEN, RUNPOD_API_KEY, etc). Pass find_dotenv(usecwd=True) so the search starts from CWD (the user's project directory) in both __init__.py and environment.py.
…rift _do_deploy injected runtime env vars (RUNPOD_API_KEY, FLASH_MODULE_PATH, FLASH_ENDPOINT_TYPE) directly into self.env, which is a hashed field. This caused config drift detection on every subsequent deploy, triggering unnecessary rolling releases. Add _inject_template_env() helper that appends KeyValuePairs to self.template.env instead. Runtime injections now go into the template (which is excluded from hashing) while self.env stays clean for drift detection.
The old self.env["FLASH_ENDPOINT_TYPE"] = "lb" was dead code — env is in _input_only, so model_dump excluded it from the API payload. The refactor to _inject_template_env made it actually reach the worker, which triggered is_flash_deployment() -> maybe_unpack() -> artifact not found error for flash run (live serverless) endpoints. For flash deploy, the runtime resource_provisioner already sets FLASH_ENDPOINT_TYPE=lb. This injection point is not needed.
When updating an endpoint, the saveTemplate mutation previously always sent the user's env vars, which overwrote platform-injected vars like PORT and PORT_HEALTH on LB endpoints. This triggered unnecessary rolling releases. Now _build_template_update_payload accepts skip_env; update() compares old vs new env and omits env from the template payload when unchanged, letting the platform preserve its injected vars.
The comment said env vars were excluded from the hash, but they are included. The exclude set only contains RUNTIME_FIELDS and EXCLUDED_HASH_FIELDS (id), not env.
…v overwrite Extract _inject_runtime_template_vars() from _do_deploy so both initial deploy and update() paths inject RUNPOD_API_KEY and FLASH_MODULE_PATH into template.env. Without this, runtime vars set during _do_deploy were silently dropped when update() overwrote the template env on config drift. Also preserve explicit template.env entries when env dict is empty on both sides.
788808e to
e69e1c3
Compare
runpod-Henrik
left a comment
There was a problem hiding this comment.
1. Root cause fix — correct
Both bugs addressed cleanly:
find_dotenv(usecwd=True)in__init__.pyandenvironment.pyfixes editable-install.envresolution. The old call with no arguments resolved relative to the package source, which meant dev installs loaded the library's own.env(PYTHONPATH=src,FLASH_HOST=localhost) instead of the user's project.- Moving runtime var injection from
self.env→self.template.envvia_inject_template_env()stops the false drift loop.self.envstays unchanged across deploys, so the hash is stable.
2. FLASH_ENDPOINT_TYPE removal from LB deploy
The deletion of self.env["FLASH_ENDPOINT_TYPE"] = "lb" in load_balancer_sls_resource.py has a comment explaining the intent — resource_provisioner sets it for flash deploy, and it must NOT be set for flash run (would trigger artifact unpacking that doesn't exist for live endpoints). This is the correct fix, but it's worth confirming the resource_provisioner path is exercised by a test so this split behavior is locked in.
3. Question: has_explicit_template_env misses one case
has_explicit_template_env = (
not new_config.env and new_config.template.env is not None
)
skip_env = env_unchanged and not has_explicit_template_envThe condition only checks not new_config.env (env is falsy). If a user has env={"HF_TOKEN": "x"} (truthy) AND explicit template.env entries, has_explicit_template_env is False even when template.env has content. If env_unchanged=True in that case, skip_env=True → the explicit template.env entries are silently dropped from the update payload.
This is a narrow edge case (requires explicitly constructing template.env outside the normal path), but if it's possible, the condition should be new_config.template.env is not None without the not new_config.env guard.
4. Testing
_inject_template_env, _inject_runtime_template_vars, and _build_template_update_payload(skip_env=...) all have unit tests (524 lines added). Coverage at the helper level is solid.
One gap: no integration test for the update() path that confirms skip_env=True is actually triggered when env_unchanged=True — the existing test_update_calls_save_template_with_resolved_template_id tests that update_template is called, but not that it omits env when unchanged. The behavior relies on the helper tests + code inspection, not an end-to-end flow test through update().
Verdict
PASS with one question to confirm (item 3 — template.env drop edge case) and a suggested test addition (item 4). The core fixes for both bugs are correct.
Summary
dotenv_values()andload_dotenv()now usefind_dotenv(usecwd=True)to walk up from CWD (user's project) instead of from the package source location. Editable installs were loading the library's dev.env(PYTHONPATH=src, FLASH_HOST=localhost) instead of the user's project.env(HF_TOKEN, RUNPOD_API_KEY, etc).self.template.envvia a new_inject_template_env()helper instead of mutatingself.env. Sinceenvis a hashed field, the old mutation caused false config drift on every subsequent deploy, triggering unnecessary rolling releases.Test plan
_inject_template_env(adds KV pair, idempotent, no-op with None template, initializes empty env list, non-mutation of self.env, API key injection into template, LB module path + endpoint type injection)make quality-checkflash run --auto-provisionon flash-examples/03_mixed_workers provisions with correct env vars, no rolling release on first request